A Natural Language Generator for Minority Languages

نویسندگان

  • Tod Allman
  • Stephen Beale
چکیده

The Bible Translator’s Assistant (TBTA) is a natural language generator (NLG) designed specifically for field linguists doing translation work in minority languages. In particular, TBTA is intended to generate drafts of the narrative portions of the Bible as well as numerous community development articles in a very wide range of languages. TBTA uses the rich interlingua approach. The semantic representations developed for TBTA consist of a controlled English based metalanguage augmented by a feature system designed specifically for minority languages. The grammar in TBTA has two sections: a restructuring grammar and a synthesizing grammar. The restructuring grammar restructures the semantic representations in order to produce a new underlying representation that is appropriate for a particular target language. Then the synthesizing grammar synthesizes the final surface forms. To date TBTA has been tested with four languages: English, Korean, Jula (Cote d’Ivoire) and Kewa (Papua New Guinea). Experiments with the Jula text indicate that TBTA triples the productivity of professional mother tongue translators without any loss of quality. A model of TBTA is shown below in Figure 1. Figure 1. Underlying model of The Bible Translator’s Assistant 1. The Semantic Representations The development of an adequate method of meaning representation for TBTA’s source texts proved to be a challenge. Formal semantics (Cann, 1993; Rosner, 1992), conceptual semantics (Jackendoff, 2002) and generative semantics (Lakoff, 1975) were each considered but found inadequate. Using the foundational principles of Natural Semantic Metalanguage theory, a set of semantically simple English molecules was identified in a principled manner (Wierzbicka, 1996; Goddard, 1998). These semantic molecules serve as the primary lexemes in TBTA’s ontology. The ontology also includes semantically complex lexemes, but each of those lexemes has an associated expansion rule that automatically expands the complex concept in terms of the semantic molecules for those target languages that don’t have a lexicalized semantic equivalent. The feature set developed for TBTA encodes semantic, syntactic and discourse information. Each feature is an exhaustive etic list of the values pertinent to the world’s languages. For example, each nominal is marked for Number, and the possible values are Singular, Dual, Trial, Quadrial and Plural. Each of these values is necessary because some languages morphologically distinguish all five of these categories. Examples of some of the features and their values are listed below in Tables 1 through 4.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minority Language Policy and Planning in the Micro Context of the City: The Case of Manchester

This paper investigates service provisions in community languages offered by Manchester City Council and agencies working alongside to find out whether there is an explicit language policy in Manchester, how such a policy is formulated, how it functions, and how it is reflected in education. Data was collected through interviews with different personnel in MCC, focus group discussions with comm...

متن کامل

Linguistic Structure and Bilingual Informants Help Induce Machine Translation of Lesser-Resourced Languages

Producing machine translation (MT) for the many minority languages in the world is a serious challenge. Minority languages typically have few resources for building MT systems. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, our research programs on minority language MT have focused on...

متن کامل

“C’est la clé du succès”: Thinking Through the Parental Experience of a New Support Program for Newcomer Students in Minority French-Speaking Schools in Canada

In 2010, the Ministry of Education of one of the most diverse provinces in Canada initiated the implementation of a support program for newcomer students to facilitate their academic, linguistic, social, and cultural adaptation in French speaking schools. This longitudinal multiple case study will document how immigrant parents support their children’s learning, and create a home environment co...

متن کامل

Quizzes on Tap: Exporting a Test Generation System from One Less-Resourced Language to Another

It is difficult to develop and deploy Language Technology and applications for minority languages for many reasons. These include the lack of Natural Language Processing (NLP) resources for the language, a scarcity of NLP researchers who speak the language and the communication gap between teachers in the classroom and researchers working in universities and other centres of research. One appro...

متن کامل

AUTOLEX: An Automatic Lexicon Builder for Minority Languages Using an Open Corpus

The aim of this study is to build natural language resources for languages with limited resources or minority languages. Manually building these resources is tedious and costly. These natural language resources such as a language corpora and lexicon will be used for natural language processing research and system development. Tagalog, a minority language was considered in this study as a test b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006